output: html_document references: http://www.realclearpolitics.com/epolls/2016/president/us/2016_republican_presidential_nomination-3823.html,http://www.realclearpolitics.com/epolls/2016/president/us/2016_democratic_presidential_nomination-3824.html
I wanted to Analyze the 2016 NC Presidential Campaign Contributions
## [1] "cmte_id" "cand_id" "cand_nm"
## [4] "contbr_nm" "contbr_city" "contbr_st"
## [7] "contbr_zip" "contbr_employer" "contbr_occupation"
## [10] "contb_receipt_amt" "contb_receipt_dt" "receipt_desc"
## [13] "memo_cd" "memo_text" "form_tp"
## [16] "file_num" "tran_id" "election_tp"
## 'data.frame': 2319 obs. of 18 variables:
## $ cmte_id : Factor w/ 14 levels "C00458844","C00500587",..: 6 6 6 6 6 5 5 5 5 5 ...
## $ cand_id : Factor w/ 14 levels "P00003392","P20002721",..: 1 1 1 1 1 4 4 4 4 4 ...
## $ cand_nm : Factor w/ 15 levels "Bush, Jeb","Carson, Benjamin S.",..: 3 3 3 3 3 11 11 11 11 11 ...
## $ contbr_nm : Factor w/ 1036 levels "ACQUAVIVA, TONY",..: 1014 604 779 495 474 257 509 444 480 149 ...
## $ contbr_city : Factor w/ 227 levels "ABERDEEN","ADVANCE",..: 52 209 115 166 224 137 150 8 58 36 ...
## $ contbr_st : Factor w/ 1 level "NC": 1 1 1 1 1 1 1 1 1 1 ...
## $ contbr_zip : int 277132233 273707743 287487012 273125862 271045057 281108912 278569198 288042811 286213012 282266472 ...
## $ contbr_employer : Factor w/ 400 levels "","3RC","5 STAR AWARDS",..: 238 296 238 238 238 284 177 177 218 294 ...
## $ contbr_occupation: Factor w/ 322 levels "","1ST GRADE TEACHER ASSISTANT",..: 183 295 261 261 261 261 132 132 135 193 ...
## $ contb_receipt_amt: num 100 75 112 100 500 50 250 2700 250 500 ...
## $ contb_receipt_dt : Factor w/ 126 levels "1-Apr-15","1-Jun-15",..: 108 14 14 86 13 3 117 23 117 39 ...
## $ receipt_desc : Factor w/ 12 levels "","REATTRIBUTION / REDESIGNATION REQUESTED (AUTOMATIC)",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ memo_cd : Factor w/ 2 levels "","X": 1 1 1 1 1 1 1 1 1 1 ...
## $ memo_text : Factor w/ 19 levels "","* EARMARKED CONTRIBUTION: SEE BELOW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ form_tp : Factor w/ 3 levels "SA17A","SA18",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ file_num : int 1015585 1015585 1015585 1015585 1015585 1015683 1015683 1015683 1015683 1015683 ...
## $ tran_id : Factor w/ 2315 levels "A020838FE8B6E4ABD8E6",..: 179 338 339 428 561 118 150 29 83 176 ...
## $ election_tp : Factor w/ 3 levels "","G2016","P2016": 3 3 3 3 3 3 3 2 3 3 ...
Candidate Names
## [1] "Bush, Jeb" "Carson, Benjamin S."
## [3] "Clinton, Hillary Rodham" "Cruz, Rafael Edward 'Ted'"
## [5] "CRUZ, RAFAEL EDWARD TED" "Fiorina, Carly"
## [7] "Graham, Lindsey O." "Huckabee, Mike"
## [9] "Jindal, Bobby" "O'Malley, Martin Joseph"
## [11] "Paul, Rand" "Perry, James R. (Rick)"
## [13] "Rubio, Marco" "Sanders, Bernard"
## [15] "Santorum, Richard J."
Count of Unique Contributors
## [1] 1036
Count of Unique Occupations
## [1] 322
Top 10, Cities and Contributor Occupations
## CHARLOTTE RALEIGH GREENSBORO CHAPEL HILL WINSTON SALEM
## 287 200 98 90 82
## ASHEVILLE DURHAM CARY WILMINGTON LELAND
## 80 80 74 50 48
## RETIRED
## 732
## NOT EMPLOYED
## 126
## INFORMATION REQUESTED PER BEST EFFORTS
## 107
## HOMEMAKER
## 101
## ATTORNEY
## 75
## INFORMATION REQUESTED
## 45
## PHYSICIAN
## 42
## ACCOUNT ASSISTANT
## 37
## BUILDING ASSISTANT MANAGER/CUSTODIAN
## 35
## CONSULTANT
## 28
Summary of Receipt Amount
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5400.0 40.0 100.0 375.9 250.0 10800.0
Rceipt Desciptions
## [1] ""
## [2] "REATTRIBUTION / REDESIGNATION REQUESTED (AUTOMATIC)"
## [3] "REATTRIBUTION / REDESIGNATION REQUESTED (AUTOMATIC) REATTRIBUTION FROM SPOUSE"
## [4] "REATTRIBUTION / REDESIGNATION REQUESTED (AUTOMATIC) REATTRIBUTION TO SPOUSE"
## [5] "REATTRIBUTION / REDESIGNATION REQUESTED (AUTOMATIC) SEE REATTRIBUTION"
## [6] "REATTRIBUTION FROM SPOUSE"
## [7] "REATTRIBUTION TO SPOUSE"
## [8] "REATTRIBUTION/REDESIGNATION REQUESTED"
## [9] "REDESIGNATION FROM PRIMARY"
## [10] "REDESIGNATION TO GENERAL"
## [11] "Refund"
## [12] "SEE REATTRIBUTION"
I noticed was that Ted Cruz was being counted twice. I used tidyr to fix this.
I also created an entirely new I also added another column called ‘Day’ with the date represented as numeric value.
## 'data.frame': 2319 obs. of 19 variables:
## $ cmte_id : Factor w/ 14 levels "C00458844","C00500587",..: 13 13 13 13 13 13 13 13 13 13 ...
## $ cand_id : Factor w/ 14 levels "P00003392","P20002721",..: 12 12 12 12 12 12 12 12 12 12 ...
## $ contbr_nm : Factor w/ 1036 levels "ACQUAVIVA, TONY",..: 903 147 155 314 560 637 741 879 113 186 ...
## $ contbr_city : Factor w/ 227 levels "ABERDEEN","ADVANCE",..: 36 168 224 222 67 35 88 214 48 200 ...
## $ contbr_st : Factor w/ 1 level "NC": 1 1 1 1 1 1 1 1 1 1 ...
## $ contbr_zip : int 282103248 276146201 27106 284054795 287340027 275178398 272657650 281736806 280367104 273589364 ...
## $ contbr_employer : Factor w/ 400 levels "","3RC","5 STAR AWARDS",..: 245 203 285 148 284 284 296 34 317 395 ...
## $ contbr_occupation: Factor w/ 322 levels "","1ST GRADE TEACHER ASSISTANT",..: 181 318 101 145 261 261 15 179 100 86 ...
## $ contb_receipt_dt : Factor w/ 126 levels "1-Apr-15","1-Jun-15",..: 25 99 44 90 29 29 77 90 99 99 ...
## $ receipt_desc : Factor w/ 12 levels "","REATTRIBUTION / REDESIGNATION REQUESTED (AUTOMATIC)",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ memo_cd : Factor w/ 2 levels "","X": 1 1 1 1 1 1 1 1 1 1 ...
## $ memo_text : Factor w/ 19 levels "","* EARMARKED CONTRIBUTION: SEE BELOW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ form_tp : Factor w/ 3 levels "SA17A","SA18",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ file_num : int 1015075 1015075 1015075 1015075 1015075 1015075 1015075 1015075 1015075 1015075 ...
## $ tran_id : Factor w/ 2315 levels "A020838FE8B6E4ABD8E6",..: 638 696 647 685 640 641 660 678 702 704 ...
## $ election_tp : Factor w/ 3 levels "","G2016","P2016": 3 3 3 3 3 3 3 3 3 3 ...
## $ cand_nm : Factor w/ 14 levels "Bush, Jeb","Carson, Benjamin S.",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ contb_receipt_amt: num 250 1350 2700 2000 250 1000 2700 2500 250 2700 ...
## $ Days : num 286 301 290 300 287 287 297 300 301 301 ...
When I looked at a summary of Contribution amounts, i noticed that a number of values were negative. Looking at the reciept desciptions showed that these values actually represented refunds.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2 50 100 397 250 10800
Excluding refunds from the dataset didn’t show a significant change in the summary values, so i left them in.
I decided to make two summary tables so that i could go over the data at a higher level. One was for candidate information, the other for donor information.
## [1] 568 6
## [1] 468 6
There are 14 Candidates and 1036 Unique Contributors 436 contributors contributed more than once. The maximum contribution was $10800. The largest refund was $-5400. The smallest contribution was $2 The interquartile range of Contributions was between $40 and $250 The median contribution was 100 The mean contribution was 375.9. Contributor occupation varied widely,though the largest group was composed of Retirees.
The largest number of contributions went to Ben Carson(594) on the Republican side, followed by Ted Cruz(425), and then Democratic Candidate Hillary Clinton(414). Rick Perry came in last with only one contribution. The largest contribution (10800) went to Ben Carson. Jeb Bush had the highest Mean and Median contribution, 2034 and 2700 respectively.
To begin, i plotted a basic histogram
I could see that most contributions were clustered slightly above 0, and included small numbers of larger positive and negative contributions. Unfortunately, it was difficult to infer any more information beyond this, so I decided to play around with the binwidth, and add some more detail to x axis.
The new plot showed that most contributions were clustered between 0 and 1000, with a spike of contributions occuring right below 3000, and a small number of much larger contributions above that. It also appeared that most refunds occured around this 3000 dollar level. Still, it was difficult to see the outliers, as there were very few of them. I didn’t want to ignore these values, however, because despite there not being a lot of them, i knew that large contributions are very important in political elections. To remedy this situation, I decided to perform a log10 transformation on the x axis.
This showed the clearest picture yet for how contributions occured. In addition to being normally distributed, i could now see, in a much higher level of detail, the scale and frequency of different contributions. The only downside here was that the log transformation turned all of the negative data values,0, which meant that I lost all of the information on refunds.
I also noticed that there were major gaps in between the contribution values I thought this was a really interesting trend.
Contribution values mostly seemed to be in whole, even, values, and were almost all a factor 25. I believe this is an example of the psychological trend of mental anchoring, as contributors are tending to contribute in familiar numbers (ex: 1000 is a more common contribution than 980). In any case, the result was that contributions looked like they mostly occured in discrete increments.
Because of this phenomenon, and the because I really did want to include refunds in my plot, I decided it would probably make sense to use of R’s cut function.
## (-6000,-3000] (-3000,0] (0,5] (5,10] (10,25]
## 1 21 70 110 303
## (25,50] (50,100] (100,250] (250,500] (500,1000]
## 399 511 431 188 86
## (1000,2750] (2750,5500] (5500,11000]
## 177 18 4
I figured it would also make sense to create a plot to show how the Candidates ranked in terms of number of contributions recieved.
We can see that only 3 of the 14 candidates(Ben Carson,Ted Cruz, and Hillary Clinton) broke 400 contributions since the start of their campaign.
I also thought it would be interesting to see how contbr_employer and “contbr_occupation”, are distributed between different candidates.
In the process of doing this, I decided that it would make sense to start excluding outlier candidates. I was more or less subjective when picking which candidates should be excluded. However i used real clear politics as a point of reference to see which candidates where relevant or not.
We can se that Retiree’s make up, by far, the largest number of Contributors. The remaining contributors are spread pretty thinly across the remaining employers, and occupation, with unemployed, blank entries, and homemakers, a large portion of the remaining Occupations, and no single employer having a significant impact contributions.
In this dataset there were 2319 contributions, made to 14 different candidates, by 1036 unique contributors. 436 contributors gave money more than once. The largest amount contributed in this dataset was $10800 and was given to Ben Carson. The biggest refund was for $5400 and refunded by Ted Cruz. The smallest value contributed was for $2, and the interquartile range of donations excluding refunds was between $50 and $397. Again, excluding refunds, the median contribution was $100 and the mean contribution was $376. The occuption of contributors seeemed to vary widely, as 322 unique values were listed. Republican Ben Carson recieved the largest number of contributions(594), followed by other Republican Ted Cruz(425), and then democratic candidate Hillary Clinton(414).Rick Perry came in last with only one contribution made to his campaign. Republican Jeb Bush had the highest Mean and Median contributions, 2034 and 2700 respectively.
There were 18 original features in the dataset.These were:
[cmte_id,cand_id,cand_nm,contbr_nm,contbr_city,contbr_st, contbr_zip,contbr_employer,contbr_occupation, contb_receipt_amt,contrb_receibt_dt,receipt_desc,memo_cd, memo_text,form_tp,file_num,tran_id,form_tp,file_num,tran_id,election_tp]
The main features of interest in my dataset are cand_nm,contb_receipt_amt. Most of the things i am interested in investigating are related to the success of the candidate. The frequency and scale of contributions, i believe, plays a large role in this.
Other features that might make sense to look at are contbr_nm, contbr_city, and contb_receipt_dt. It would be interesting to see if something like the 80-20 rule applies here, with 80% of contributions being given by 20% of contributors. I believe it would also be helpful to see the locations where different contributions were coming from, maybe to see if certain areas were more valuable in terms of campaigning. Finnally it would make sense to look at when different donations occured, to see if certain candidates have been trending up or down as their campaign has progressed.
I created a variable called contb_receipt_amt.Bucket, from contb_receipt_amt using R’s cut function. The purpose of this was to get around the fact that most donations were occuring in seemingly discreete increments. I also added another column called ‘Day’ with the date represented as numeric value(I subtracted the minimum date from all dates in the vector. I did this so that i would be easier to plot with this information later on.
When the data was first loaded.Ted Cruz had to cand_nm levels devoted to him, which was a mistake. To fix this i used tidyr’s spread function, to make those variables columns, then i added values of the vector of the extra one to the values of the vector of the first one. I then gathered the updated columns and deleted the extra one.
I decided to visualize some of the summary table information The first question i wanted to answer was “Who recieved the most money?”
It turns out that while Ben Carson was still in the lead for Republians, Hillary Clinton was actually the candidate who raised the most money so far.
Let’s see how candidates ranked up in terms of Mean Donations.
Looks like a close contest between Jeb Bush, Mike Huckabee,Lindsey Graham, and Martin O Mall. That’s really strange. O’ Mally’s been polling at close to 0 since the start of the race. I wonder what the plot would look like if we used Median Contributions instead.
Looks like the same 4 Candidates as before, except now Jeb Bush is blowing everybody out of the water.Also interesting, is the Ben Carson, who led the pack in terms of the number of donations and maximum donations, is actually now closer to the tail of the plot.
Let see how maximum donations played into the mix.
Ok that’s reallly intesting. Except for Mike Huckabee,the top 4 leaders in mean and median aren’t the leaders in the largest donations. Ben Carson actually leads here, which is weird since he was ranked in the middle for Mean Donations, and large donations should have skewed that value upwards with him. Maybe it was just a fluke. In any case there was only one way to be sure. Time for a boxplot.
So now this makes more sense. It looks like most of Jeb Bush’s donations were above the $2500 mark, while ben carsons donations were all much smaller. Ben did get the largest donation,and he does have a few outliers, but it looks the like great majority of,his support has been from small donations.Martin O Mally, we can also now see that he hasn’t had a lot of donations, and the one’s he’s had were mostly small scale. However, because the frequency of donation for him is so low, the larger donations he did get probably skewed his mean upwards just because that number is so variable.
Let’s quickly look where, in terms of location, this money is coming from.
Looks like most of the money is coming from Charlotte and Raleigh.
We talked about the 80-20 rule, lets make another plot to see if thats actually playing out here.
Wow, so whoever, Mr Harrison J Frank III is i’d like to meet him since, i guess, he had the disposable income to more than double the contribution of the next biggest contributor. In terms of the 80:20 rule, the top 20% of contributors(Top 207), made up around 67.4% of total contributions, while the top 3% of contributors(30) made up just over 20% of total contributions.
So this is a BIG skew.
Sum of Total Contributions
## [1] 871792.3
Sum of Top 20%(207) Contributors
## [1] 587372.3
Percentage of Total Contributions made up the top 20% of Contributors
## [1] 0.6737525
Sum of Top 30 Contributors
## [1] 177679.5
Percentage of Total Contributions made up the top 3%(30) of Contributors
## [1] 0.2038094
The last thing i really wanted to look at in this section was to see how the frequency of contributions compared to Total Contributions
I could see that the highest concentration of donors were one offs, and that the range of their donations varied widely.
I decided that rather attempt a subjective intpretation, it might make sense to just perform a linear regression.
##
## Call:
## lm(formula = newlog(TotalContributions) ~ FreqOfContributions,
## data = Contributions.Summary.Con)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.0238 -0.5165 -0.3059 0.8698 4.2222
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.05204 0.04875 124.138 <2e-16 ***
## FreqOfContributions -0.01412 0.01404 -1.006 0.315
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.199 on 1034 degrees of freedom
## Multiple R-squared: 0.000977, Adjusted R-squared: 1.086e-05
## F-statistic: 1.011 on 1 and 1034 DF, p-value: 0.3148
The Coefficient for FreqOfContributions was very close to zero, and the p value was very high. This meant that we could not reject the null hypothesis that the coefficient on FrequencyOfContributions should actually be zero, which meant it didn’t appear that there was a relationship between these two values.
Now I wanted to circle back to looking at Contributor Occupation and Contributor Employer. This time,however, i decided to add the extra dimension of breaking these down by Candidate.
So adding candidate name as a dimension doesn’t add a huge amount of helpful data. One thing we can see now, however, is that Ben carson is getting a huge number of contributions from Retireees. This is interesting given that Ben Carson previously received notice due to the fact that most of his contributions were given in small amounts. Retiree’s have limited income, so it makes sense that if they were to contribute, they would be giving out small sums.
The strongest relationships i saw were betweeen mean donation and median donation. Jeb was in the lead on both, despite not having the most donations, or the largest donations.
It looks like at this point in the campaign season it may be too early for money to matter. Another alternative is that most of the contributions that matter are not listed in the Federal Election Commitee data, as it has instead gone to super pacs. I can say that after playing around with mean, median,max, frequency, and total sum of donations that the same 4 candidates kept popping up in different orders. These were Hillary Clinton, Ben Carson, Ted Cruz, and Jeb Bush. 3 of these four ranked highly in terms of the frequency of donations, Jeb Bush was the only one who didn’t score highly there. This was probably mitigated by the average size of his donations, as they were on average much stronger than his peers. Perhaps the strongest argument that could be said is that are only around 4 or 5 credible candidates that are listed in the dataset, and that everybody below this threshold for the most part performs poorly.
I saw that the most money came out of Charlotte.
I also saw that a dispraportionate amount of money came from a small number of super contributors. It blew my mind that 3% of donors made up 20% of total contributions.
The relationship between median and mean donations. Jeb bush was solidly in the lead with both.
## [1] "Bush, Jeb" "Carson, Benjamin S."
## [3] "Clinton, Hillary Rodham" "Cruz, Rafael Edward 'Ted'"
## [5] "Fiorina, Carly" "Graham, Lindsey O."
## [7] "Huckabee, Mike" "Jindal, Bobby"
## [9] "O'Malley, Martin Joseph" "Paul, Rand"
## [11] "Perry, James R. (Rick)" "Rubio, Marco"
## [13] "Sanders, Bernard" "Santorum, Richard J."
I wanted look at how contributions rose or fell for candidates over time. Immediately ,however,i noticed a problem. There are 15 candidates. This is a large number for a legend, and is compounded by the fact that the colors are effectively uninterpretable.
To mitigate this issue I decided to go back to using the Contributor dataset that excluded outliers. I also changed the y axis to log10 and used CumSum to calculate the y value as the amount collected to date, instead of trying to track all the hundreds of individual contributions.
So this is great. We can see that Rand Paul started off with a Huge head start, raising funds WAY ahead of everybody else. However, once other candidates, began to enter the race, his trajectory faded out into the middle of the pack. Ben Carson spiked to the lead immediately after he entered the race, but then was overtaken by Hillary equally as quickly as soon as she entered. Bush, despite entering last, has the steepest trajectory, and is a very close third. According to this plot, he is on track to pass both Carson and Hillary and take the lead, given only a little bit more time.
Again we can see Jeb Bush in the lead for Mean and Median. It’s also interesting to see that the average size of the contribution amounts for Rand Paul, dropped significantly almost immediately after Ted Cruz entered the race.
Interesting. It seem that in the largest donation cities, the most money is going to Hillary Clinton and Ben Carson respectively, with Jeb Bush coming in third.
Jeb Bush again won hands down when we started to look at mean and median contributions over time. We also saw that part of the reason for this was because he almost always recieves mid level donations. Another interesting point was that the same three candidates that recieved the most Total Contributions come up in the same rankings when looking at contributions by location. Again it was Hillary Clinton in 1st place, Ben Carson in 2nd, and Jeb Bush in 3rd.
I do not feel comfortable, based on this data, that i will be able to build an effective linear model for Campaign Donations. There are just too many outside variable to take into consideration (Median, Population, Debates). Other than Location of Contributors, i think that the issues driving the success in donations for these different candidates, are factors that aren’t represented in this dataset.
I was suprised that Jeb Bush Started receiving money so much later than the other candidates, and Rand Paul so early. It was also interesting to see that the mean donations for Rand Paul were initailly much higher before other candidates started receiving funds. There was a significant drop off following this occurence. Finally it was interesting to see how much Location plays a role in fundraising. It does not seem like it would be a coincidence that the same three candidates leading in Total Contributions, are the same three candidates leading in terms of amounts contributed by the big money giving cities(and in the same order as well!).
Jeb Bush started raising funds last, but was able to jump to third place in terms of total funds raised. Clinton has been in the lead since she started raising cash,which is startling given that her median contribution amount is so small. Also Paul Rand, despite getting contributions 6 months before anybody else, has fallen to the middle of the pack, both in terms of his Total Contributions, and his Median Contributions.
Again we can see that while Jeb Bush is the weakest of the ‘frontrunners’ in terms of the number of his donations, the contributions he does have are quite large. This differentiates him from the remaining candidates as the distribution of their contributions are centered further to the left, and more or less follow the same patterns.
Ben carson has the most normalized distribution of Contributions made out to him using log10 and buckets on the x axis. Jeb Bush’s distribution is centered further to the right at the $500-$1000 range, with a number of larger and smaller donations above and below this. The total frequency of his contributions seems to be lower. Bernie sanders seeems to actually mimic ben Carson a lot in terms of the frequency and scale of his distributions, however, it’s easy to see why Hillary has been so successful fund raising, as she is represented strongly across all of the buckets.
The Contributions data set contains information on 14 presidential candidates and all of their registered contributors in the State of North Carolina.
I started my Analysis by understanding the individual variables, and how their levels were structured. I then processed the dataset using tidyr in order to to fix an issue one of the candidates being split into two different levels. I began plotting, by creating a set of histograms, all of which were incrementally better than their preceding plots. These incremental fixes included changing the binwidth, changing the x-axis from a continuous scale to a logarithmic one, and then finnally using R’s ‘Cut’ function to group the contribution amounts into buckets. This last step was done because i noticed that there were gaps between contribution amounts, that indicated a behaioral phenominon similar to achoring (ex: preference for round numbers). After this i broke the contribution frequencies down by Occupation and Employer and saw that retiree’s and non working contributors made up by far the largest segment of contributors.
Another step i took was to look at summmary statistics by candidate. Jeb Bush stood out as an outlier in this regard, given that, despite receiving contributions the latest, and recieving the fewest number of contributions out of all the frontrunner candidates, the mean and median contributions he did receive were far larger than those of his peers. Hillary Clinton was the strongest candidate in terms of total contributions, though Ben Carson, a political outsider in many ways performed strongly. Ben Carson, who received the greatest number contributions, and ranked second in terms of the sum of contributions received, was interesting given that this success seemed to be derived primarily via a large number of small donations. Looking at a breakdown of contributor occupation by candidate confirms this trend, given that effectively half of the retirees who donated, gave their money to Ben Carson. Retiree’s are generally older and have less disposable income, so it makes sense that they would be have more time to focus on political activity, though less money to actually send in a check.
I was unable to create a linear model to predict the Total Contributions for each candidate, as the data was too variable and unrepresentive of many key drivers in campaigning. Maybe as things get later into the campaign it would make more sense, but as right now i think it istoo early. One thing that would make this task easier is to have access to the list of Major Super Pacs working on behalf of these different candidates. Or even look at the amount spent by each candidate. The added benefit of this last step would be that I would be able to include Republican front runner Donald Trump in this analysis. This is currently impossible given that he is self financing his campaign, and thus would not be receiving contributions via the FEC.
In the final three plots we gained even more information. Before this was acheived,however, i was forced to exclude outlier(the weakest) candidates from the dataset. This pruning process was done by looking at the weakest candidates according to the Republican and Democratic average on the poll aggregation website real clear politics. In any case, we found that in addition to performing well in terms of Total, Mean and Median contributions raised,Jeb Bush actually started raising funds the latest out of all the candidates. Conversely, Rand Paul, who started raising funds the earliest,almost immediately saw his metrics average out to the rest of the pack as soon as other candidates started entering the race.
In the density plot and Contribution histogram plots we saw complementary information showing that while Jeb Bush is the weakest of the ‘frontrunners’ in terms of the number of his donations, the contributions he does have are significantly larger than his competitors large, a fact that differentiated him.
Finally in the dodged histogram, we saw that the contributions for each candidate had different patterns and distributions. Ben carson has the most normalized distribution of Contributions made out to him using log10 and buckets on the x axis, while Jeb Bush received a huge almost all of his contributions in the 1000 to 2750 dollar range(though he had a small number of contributions that were larger and smaller than this). Hillary Clinton and Bernie Sanders, seemed to mimic Ben Carson’s populist trend(large numbers of small donations), though Hillary was represented more strongly in the larger valued buckets. For instance, she was effictively equal to Bush in terms of the frequency of funds raised from the 1000 to 2750 dollar range, despite also being strongly represented in contributions from small donors. Because of this last point, and because I am inferring that this broad representation in contributions equats to a broad representation among different contributor segments, I actually came away with the conclsion that Hillary was the strongest candidate overall out of the ones listed in this dataset.